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Calculating a Monte Carlo standard error (MCSE) is an impor- 
tant step in the statistical analysis of the simulation output obtained 
from a Markov chain Monte Carlo experiment. An MCSE is usually 
based on an estimate of the variance of the asymptotic normal distri- 
bution. We consider spectral and batch means methods for estimating 
this variance. In particular, we establish conditions which guarantee 
that these estimators are strongly consistent as the simulation effort 
increases. In addition, for the batch means and overlapping batch 
means methods we establish conditions ensuring consistency in the 
mean-square sense which in turn allows us to calculate the optimal 
batch size up to a constant of proportionality. Finally, we examine 
the empirical finite-sample properties of spectral variance and batch 
means estimators and provide recommendations for practitioners. 

1. Introduction. Suppose tt is a probability distribution with support X 
and the goal is to calculate E n g := L g{x)ix(dx) where g is a real-valued, 
7r-integrable function. In many situations, tt is sufficiently complex so that 
we may have to rely on Markov chain Monte Carlo (MCMC) methods to 
estimate E^g. As is now widely recognized (Liu [30], Robert and Casella 
[41]), we can often simulate a Harris ergodic (i.e., aperiodic, 7r-irreducible, 
positive Harris recurrent) Markov chain on X having invariant distribution 
tt and easily estimate E^g. Specifically, letting X = {X\, X2, X3, . . .} denote 
such a Markov chain, with probability one and for any initial distribution 
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The approximate sampling distribution of the Monte Carlo error, g n — E v g, 
is available via a Markov chain central limit theorem (CLT) if there exists 
a constant a 2 g G (0, oo) such that 

(2) Vn(g n -E w g) -»N(0, cr^) asn-^oo. 

The conditions we will require in our theoretical work below are sufficient to 
guarantee (2) for any initial distribution. In fact, our conditions will imply 
the stronger Markov chain functional central limit theorem. See Jones [23] 
and Roberts and Rosenthal [44] for broader discussion of the conditions for 
a Markov chain CLT. 

Obtaining a good estimate of a 2 , say a 2 , is an important step in the sta- 
tistical analysis of the observed sample path for at least two reasons: (1) it 
can be used to construct asymptotically valid confidence intervals for E w g 
and (2) it is a key component of rigorous rules for deciding when to ter- 
minate the simulation. We describe these approaches more fully below but 
the interested reader is directed to Flegal, Haran and Jones [12] and Jones 
et al. [24] for more detail and comparisons with other methods. In partic- 
ular, these papers demonstrate that terminating the simulation based on 
a\ is superior to the common practice of terminating based on convergence 
diagnostics. 

The simplest approach to stopping an MCMC experiment is a fixed-time 
rule. Specifically, the Markov chain simulation is run for a predetermined 
number of iterations, using g n to estimate E n g. If o\ is a consistent estimator 
of a 2 , a valid Monte Carlo standard error (MCSE) of g n is given by (J n /yfn 
and an asymptotically valid interval estimator of E n g is given in the usual 
way 

- , . &n 

v n 

where t* is an appropriate Student's t quantile. Reporting an interval esti- 
mate of E n g, or at least the MCSE, will allow independent evaluation of the 
quality of the reported point estimate. Of course, it is possible the interval 
estimate is undesirably wide and hence the simulation should continue. This 
naturally leads to a sequential approach to terminating the simulation the 
first time the interval is sufficiently narrow, i.e., a fixed-width rule, where the 
total simulation effort is random. Formally, the user specifies a desired half- 
width e and the simulation terminates the first time the following inequality 
is satisfied 

(3) £*-^f= +p(n) < e, 

\ n 



where p(n) is a positive function such that as n — > oo, p(n) = o(n 1 / 2 ). 
Glynn and Whitt [17] established that if a functional central limit theorem 
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holds and a\ — > <jg with probability 1 as n — > oo, then the interval at (3) 
is asymptotically valid in the sense that the desired coverage probability 
is obtained as e — > 0. Moreover, the role of p(n) is to ensure that for small 
values of e the simulation effort n is large. Letting n* be the desired minimum 
simulation effort, an often useful choice is p(n) = el(n < n*) + n~ 1 where /(•) 
is the usual indicator function. 

Due to the inherent serial correlation in the Markov chain, generally 
(jg ^Vai^g, and hence estimating o 2 g requires specialized techniques such 
as non-overlapping batch means (BM), overlapping batch means (OBM), 
spectral variance (SV) methods and regenerative simulation (RS). We study 
asymptotic, specifically strong consistency and mean-square consistency, and 
finite-sample properties of the BM, OBM and SV procedures. Strong con- 
sistency of the BM and RS estimators was addressed by Jones et al. [24] and 
Hobert et al. [20], respectively. In the current work we develop conditions 
for the strong consistency of OBM and SV procedures. More specifically, 
we require that the Markov chain be geometrically ergodic while existing 
results on the consistency of OBM and SV methods require the chain to be 
uniformly ergodic; for a definition of geometric and uniform ergodicity see 
Tierney [50]. Consistency of RS and BM also require geometric ergodicity, 
however, SV methods require a slightly stronger moment condition on g 
compared to RS, BM and OBM. Overall, as we discuss in Sections 2 and 3, 
our results significantly weaken the existing regularity conditions guarantee- 
ing strong consistency for SV and OBM. It is also worth emphasizing that 
the results on strong consistency of BM, OBM, RS and SV do not require 
a stationary Markov chain and hence from a theoretical perspective burn-in 
is not required. Of course, the initial distribution should be carefully chosen 
since it will impact the finite sample properties of the chain and the resulting 
output analysis. 

Establishing that a given Markov chain is geometrically or uniformly er- 
godic can be challenging. On the other hand, there has been a substan- 
tial amount of effort directed towards doing just this in the context of 
MCMC. For example, we know that Metropolis-Hastings samplers with 
state-independent proposals can be uniformly ergodic (Tierney [50]) but 
such situations are uncommon in realistic settings. Standard random walk 
Metropolis-Hastings chains on M. d , d > 1 cannot be uniformly ergodic but 
may still be geometrically ergodic (see Mengersen and Tweedie [35]). An in- 
complete list of other research on establishing convergence rates of Markov 
chains used in MCMC is given by Geyer [13], Jarner and Hansen [21], Meyn 
and Tweedie [37] and Neath and Jones [39] who considered Metropolis- 
Hastings algorithms and Hobert and Geyer [19], Hobert et al. [20], Johnson 
and Jones [22], Jones and Hobert [26], Marchev and Hobert [33], Roberts 
and Poison [42], Roberts and Rosenthal [43], Rosenthal [45, 46], Roy and 
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Hobert [47], Tan and Hobert [49] and Tierney [50] who examined Gibbs 
samplers. 

Optimal batch size selection for BM and OBM is a long-standing open 
problem. Song and Schmeiser [48] propose an approach that minimizes the 
asymptotic mean-squared error. Thus we also prove the BM and OBM esti- 
mators are mean-square consistent, or 

(4) MSE(<j2) : = E n (&l - a]f as n ->■ oo. 

Our work on establishing (4) allows us to argue that the asymptotically 
optimal batch size in terms of MSE is proportional to ra 1 / 3 . This is simi- 
lar to the conclusions of others (Chien, Goldsman and Melamed [6], Song 
and Schmeiser [48]); however, our mixing and moment conditions are much 
weaker. Specifically, the previous work requires a uniformly ergodic Markov 
chain and an absolute twelfth moment where we require only geometric er- 
godicity and a bit more than a fourth moment. 

The BM, OBM, RS and SV procedures are all easy to implement. RS is 
sometimes viewed as the standard by which others should be judged (see, 
e.g., Bratley, Fox and Schrage [4], Jones and Hobert [25]). However, RS 
may require additional theoretical work which could be an obstacle to some, 
and moreover, has been found to be problematic in high-dimensional set- 
tings (Gilks, Roberts and Sahu [14]) and in variable-at-a-time Metropolis- 
Hastings settings (Neath and Jones [39]). Of the alternatives to RS, OBM 
and SV procedures have the reputation of being more efficient than BM. For 
example, results in Section 3.3 show the ratio of the variance of the OBM 
estimator to the variance of the BM estimator converges to 2/3 as the sim- 
ulation effort increases. However, asymptotics alone do not appear to give 
a clear picture of which method should be preferred. Hence we investigate 
the empirical finite-sample properties of these methods in the context of two 
examples. Discussion of our findings is given in Section 4.3 but, on balance, 
some of the SV methods appear to be superior. 

The rest of this paper is organized as follows. In Section 2 we establish 
strong consistency of SV procedures. Next, Section 3 contains a discussion 
of the asymptotic properties of BM and OBM procedures. The finite sample 
properties of the various methods are studied in two examples in Section 
4 where we also give some general recommendations. Finally, most of the 
technical details and proofs are presented in appendices. 

2. Spectral estimation. In this section, we define a class of estimators 
of o~g and establish conditions which guarantee their strong consistency. 
First, define the lag s autocovariance 7(5) = j(—s) := £ , 7r [l^lt +s ] where Yi := 
g(Xi) — E n g for % = 1, 2, 3, Consider estimating 7(s) with 

n—s 

7n(s) = ln{s) := n" 1 J2( Y t - Yn)(Y t+s - Y n ), 
t=l 
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where Y n := n~ l Y17=i ^ E^g 2 < oo, then for fixed s, 7 n (s) — >• 7(5) w.p. 1 
as n — > 00. 

One could use the sum of the 7 n (s) to estimate a 2 , though this turns 
out to be a poor estimator (see Anderson [1] and Bratley, Fox and Schrage 
[4]). Instead we will investigate a truncated and weighted estimator version 
called the spectral variance estimator 

6 n -l 

& S : = w n(shn{s), 
s=-(6n-l) 

where w n {-) is the Zag window, and 6 n is the truncation point. 

Our theoretical results require the following assumptions on the lag win- 
dow, truncation point and one-step Markov transition kernel associated with 
X, denoted P(x,A). 

Assumption 1. The lag window w n (-) is an even function defined on 
the integers such that 

|w ; n('S)| < 1 for all n and s, 
Wn{0) = 1 for all n, 
w n (s) = for all \s\ > b n . 

Assumption 2. Let b„ be an integ er sequence such that b n — > 00 and 
n/b n — > 00 as n— > 00 where b n and n/b n are monotonically nondecreasing. 

Assumption 3. There exists a function s:X— > [0,1] and a probability 
measure Q such that P(x, ■) > s(x)Q(-) for all 16X. 

The main result of this section follows. 

Theorem 1. Let X be a geometrically ergodic Markov chain with in- 
variant distribution tt and g :X— )■ M be a B or el junction with E n \g\^ +s+e < 00 
for some 5 > and e > 0. Suppose Assumptions 1, 2 and 3 hold and define 
Aiw n (k) = w n (k - 1) - w n (k) and A 2 w n (k) = w n (k - 1) - 2w n (k) + w n (k + 
1). Further suppose (a) b n n~ x Ylh=i ^1 Aiw n (k)\ — > as n — > 00; (b) there 
exists a constant c > 1 such that ^2 n (b n /n) c < 00; (c) 6 n n _1 logn — > as 
n — > 00; (d) 

/bn \ 2 

6 n n 2a (logn) 3 I ^|A 2 iy n (fc)| ) -> asn^oo, 
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and 

n 2a (logn) 2 ^ \A 2 w n (k)\ ->■ osn->oo, 

k=l 

where a = 1/(4 + 5); and (e) b~ l n 2a \ogn — > os n-^oo. TTien /or any 
initial distribution, with probability 1, cr 2 s ^-a 2 as n— >oo. 

Proof. See Appendix B.l. □ 

Remark 1. It is convenient in applications to take b n = \ n u \ for some 
< v < 1 in which case conditions (b) and (c) of Theorem 1 are automati- 
cally satisfied. 

Remark 2. Assumption 3 is not critical. First, we do not require the 
actual value of s or Q at any point in this paper. Thus, unlike RS which 
is entirely based on assumption 3, there is no practical point in searching 
for a good minorization. Secondly, recall that fundamental Markov chain 
theory (see Chapter 5 of Meyn and Tweedie [36]) ensures the existence of 
an integer no for which a minorization condition holds for the no-step kernel, 
that is, P n ° . If we cannot establish the one-step minorization in Assumption 
3, but we can establish an no-step minorization, then we would just use the 
chain with kernel P n ° . This is reasonable since the P n ° inherits the stability 
properties of P. 

Remark 3. Anderson [1] gives an extensive collection of lag windows 
satisfying Assumption 1. It is useful to consider the applicability of the 
conditions (a) and (d) of Theorem 1 for some of these windows. 

Simple truncation: set w n (k) = I(\k\ < b n ), then it is easy to see that 
condition (d) requires 46 n n 2 "(logn) 3 — > which obviously cannot hold. 

Blackman-Tukey: let w n (k) = [1 — 2a + 2ocos(7r|/c|/6 n )]I(|fe| < b n ) where 
a > 0. When a = 1/4 this is the Tukey-Hanning window. That condition (a) 
holds if b\n~ x — > as n — > oo while condition (d) is satisfied if 6~ 1 n 2a (logn) 3 - 
as n — > oo follows easily from Lemma 7 in Appendix B.2. 

Parzen: let w n {k) = [1 - \k\ i/b%]I(\k\ < b n ) for q G Z + = {1, 2, 3, . . .}. When 
q = 1 this window is the modified Bartlett window and deserves to be singled 
out because of its connection to the method of overlapping batch means 
which we will consider later. In this case /S.\w n (k) = A2W n (b n ) = b^ 1 for 
k = 1, . . . , b n so that condition (a) requires o 2 n _1 — > as n — > oo. Next, note 
that A2W n (k) = for k = 1, 2, . . . , b n — 1. Thus condition (d) is satisfied if 
6~ 1 n 2a (logn) 3 — > as n — > oo. 

When q > 2 it is easy to show that the conditions of Lemma 7 in Appendix 
B.2 hold and hence that conditions (a) and (d) will be satisfied under exactly 
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the same conditions as required for the modified Bartlett lag window. That 
is, if b\n~^ — > and 6~ 1 n 2a (logn) 3 — > as n — > oo . 

Scale-parameter modified bartlett: let w n (k) = [1 — A|/c|/6 n ]/(|&;| < b n ) where 
A 7^ 1 is a positive constant. Then A\w n {k) = A6" 1 for k = 1, . . . , b n — 1 and 
Aiw n (b n ) = 1 — A + A6" 1 so that condition (a) becomes (1 — \)b n n~ 1 + 
^b n (b n + l)(2n) _1 — > 0, that is, b 2 n n~ x — > as n — > oo. On the other hand, 
there is trouble with condition (d). Note that A2W n (b n ) = 1 — A + A6" 1 and 
A.2W n {b n — 1) = — 1 + A but A.2W n {k) = for k = 1, 2, . . . , b n — 2. Hence, as 
n— >oo, Yl'k l =i\^ w n{k)\ does not converge to 0. Thus condition (d) cannot 
hold. 

Remark 4. Damerdji [9, 10] has previously addressed strong consis- 
tency of Gg. However, our result substantially weakens the regularity condi- 
tions for Harris ergodic Markov chains. In particular, Damerdji's approach 
requires a uniformly ergodic chain whereas Theorem 1 requires only geomet- 
ric ergodicity. Also, instead of condition (d), Damerdji's result requires 

as n — > oo, and 



as n — y oo, 

where < a' < (5 — 2 + e)/(24 + 12(5 — 2 + e)). This requirement is not 
particularly useful when b n = [n u \ . For example, consider using the modified 
Bartlett lag window. Then just as in Theorem 1, Damerdji requires n 2 " -1 — > 
as n — > oo but (5) requires n 1-2 " ~ u (logn) — > as n — > oo and there is no 
v value that satisfies both of these requirements. 

Finally, Damerdji required what we view to be an unnatural regularity 
condition. Specifically, for large enough n, b~ l Y17=n-b +l ^? ^ s ahnost surely 
bounded above. An inspection of our proof will show that we can weaken the 
moment condition to E n \g\ 2+s+e < oo for some 5 > and e > by making 
the same assumption as Damerdji. 

3. Batch means. In nonoverlapping batch means the output is broken 
into blocks of equal size. Suppose the algorithm is run for a total of n = a n b n 
iterations and for k = 0, . . . , a n — 1 define Yj. := ft" 1 Y^i=i Ykb n +i- The BM 
estimate of a 2 is 

t On — 1 

(6) ^BM~Eft-^ 2 



(5) 



b n n l - 2a \\ogn)(^\A 2 w n (k)\^ ^0 



n ^'Y^\A 2Wn (k)\^0 



k=l 
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It is well known that generally (6) is not a consistent estimator of <r 2 (Glynn 
and Iglehart [15], Glynn and Whitt [16]). On the other hand, Jones et al. [24] 
show that if the batch size and number of batches are allowed to increase as 
the overall length of the simulation does (e.g., by setting a n = b n = [n 1 / 2 ]), 
then <Tg M — > (Jg with probability one as n— > oo. However, Jones et al. [24] 
found that the finite sample properties can be less desirable than expected; 
thus we consider OBM. 

OBM is a generalization of BM but it is also well known that the OBM 
estimator is equal, except for some end-effect terms, to the SV estimator 
arising from the modified Bartlett lag window — a relationship we exploit 
later. Note that there are n — b n -\- 1 batches of length b n indexed by j running 
from to n — b n and define Yj(b n ) = 6" 1 Yli=i Yj+i- The OBM estimator of 
<jg results from averaging across all batches and is defined as 

m ^= (B _^ + 1) Stt(M-r.> a - 

3=0 

The next result establishes strong consistency of the OBM estimator. 

Theorem 2. Let X be a geometrically ergodic Markov chain with in- 
variant distribution tt but any initial distribution and g : X — )■ R be a Borel 
function with E. K \g\ 2+s+e < oo for some 5 > and e > 0. Suppose Assump- 
tions 2 and 3 hold. Further suppose (a) there exists a constant c > 1 such 
that J2n(bn/n) c < oo; (b) b n n~ 1 logn — >0 as n — >oo; (c) n 2a (logn) 3 /6 n — > 
as n — >■ oo where a = 1/(2 + 5); (d) there exists an integer no and a constant 
c\ such that for all n > uq we have logn/6 n < c±; and (e) 6 2 n _2 loglogn — > 
and 6„n _3 loglogn — > as n-> oo, then as n — > oo, <5"qbm — > a 2 g w.p. 1. 

For the proof, see Appendix B.3. 

Remark 5. It is possible to obtain strong consistency of Oq BM directly 
from Theorem 1 using the modified Bartlett lag window. However, Theorem 
1 requires a stronger moment condition on g. In addition, to meet condi- 
tion (a) of Theorem 1 would require that b\n~ x — > as n — > oo; hence the 
conditions of Theorem 2 are weaker. 

Remark 6. Chan and Geyer [5] established a CLT as at (2) under the 
same moment and mixing conditions stated in our Theorem 2. Moreover, 
without assuming reversibility this moment condition cannot be weakened 
to just a second moment for geometrically ergodic chains (see Haggstrom 
[18]). 
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Corollary 1. Let X be a geometrically ergodic Markov chain with in- 
variant distribution it and g :X — > M. be a Borel function with E n \g\ 2+s+e < oo 
for some 5 > and e > 0. Suppose Assumption 3 holds and b n = \ n l '\ where 
3/4 > v > (1 + d/2)" 1 , then Oqbm ->■ o 2 w.p. 1. 

Proof. This follows easily by verifying the conditions of Theorem 2. 

□ 

Remark 7. Damerdji [10] and Jones et al. [24] show the nonoverlap- 
ping batch means estimator ^"g^ is strongly consistent for uniformly and 
geometrically ergodic chains, respectively, under similar conditions as those 
required for Theorem 2. For example, Jones et al. [24] show that if b n = \ n v \ , 
then strong consistency requires 1 > v > (1 + (5/2)" 1 > for 5 > resulting 
in weaker conditions on v than those required in Corollary 1. 

3.1. Mean-square consistency. We now turn our attention to showing 
that a 2 is consistent in the mean-square sense, that is, (4). Recall that 
strong consistency and mean-square consistency do not generally imply each 
other, hence we cannot directly appeal to the results in the previous section, 
although they will be useful in our proofs. 

There are some existing results on this problem. For example, consider the 
following results from Chien, Goldsman and Melamed [6]. If X is a stationary 
uniformly ergodic Markov chain, Assumption 2 holds, and E n g 12 < oo, then 
as b n — > oo (whether a n — > oo or not) 

(8) 6 n Bias[<r| M ] = r + o(l), 

where T := — 2^^ =1 s7(s) which is well defined and finite. Also, if a n — > oo 
and b n — > oo 

(9) f Var(<7l M ) = 2^ + 0(1). 

Combining (8) and (9) imply (4) for o"| M as a n — > oo and b n — > oo. The next 
result establishes (4) where a\ is cig M or o-q BM under weaker mixing and 
moment conditions. 

Theorem 3. Let a 2 be either the BM or OBM estimator of a 2 , and 
let X be a stationary geometrically ergodic Markov chain with invariant 
distribution ir and g:X — > M. be a Borel function with E 7T \g\ 4:+s+e < oo for 
some 5 > and e > 0. Suppose Assumptions 2 and 3 hold and E n C A < oo 
where C is defined at (14)- If b~ l n 2a (log nf — > as n — )■ oo where a = 
1/(4 + 5), then MSE(<r£) ^0 as n ->• oo. 
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Remark 8. The proof of Theorem 3 with results in Damerdji [11] show 
that the conclusions also hold for uniformly ergodic chains if we replace 
the above condition on b n with b~ 1 n 1 ~ 2a (logn) — > as n— > oo where a' < 
(5 + 2 + e)/(24 + 12(,5 + 2 + e)). 

Remark 9. Results in Damerdji [11] and Philipp and Stout [40] show 
that when the Markov chain is uniformly ergodic 

T2(w)-1 

G» = 2+ \9( X i(")l 
n(w) 

where Tj, i = 1,2 are the first two times at which regenerations occur in 
the split-chain (see Hobert et al. [20], Jones and Hobert [25] and Mykland, 
Tierney and Yu [38] for an introduction to the split chain). Lemma 2 in 
Bednorz and Latuszyhski [2] shows that, in this case, E n C 4 < oo. When the 
chain is only geometrically ergodic the representation of C is not as clear 
but we suspect that the results of Bednorz and Latuszyhski [2] can again 
be used to establish the moment condition on C since it is still defined in 
terms of the split chain (see Csaki and Csorgo [7] ) . 



3.2. Optimal batch sizes in terms of MSE. In this section, we will use 
the previous results to calculate optimal batch sizes. Chien, Goldsman and 
Melamed [6] and Song and Schmeiser [48] study the case of BM. Combining 
(8) and (9) yields 

MSE,4 M )^ + ^ + o(l) + „(^ 

It is easy to use the above expression to see that MSE(<Tg M ) will be mini- 
mized asymptotically by selecting the optimal batch size of 



Notice that this optimal batch size is dependent on T 2 /o~g which is typically 
an unknown parameter relating to the process. However, this result implies 
that the optimal batch size should increase proportionally to n 1//3 . 
The main result of this section follows. 



Theorem 4. Let X be a stationary geometrically ergodic Markov chain 
with invariant distribution ir and g : X — > R be a Borel function with 
ETr\g\ i+&+e < oo for some 5 > and e > 0. Suppose Assumptions 2 and 3 
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hold and E^C 4 < oo where C is defined at (14). If fc-V^+^logn) 3 / 2 -»• 
as n —7- 00 where a= 1/(4 + 5), then 

(10) f Var(«7 2 ) = ca ff 4 + (1), 

w/iere c = 2 for BM and c = 4/3 /or 

Remark 10. The proof of Theorem 4 coupled with results from Damerdji 
[11] can be applied to uniformly ergodic Markov chains. Specifically, the con- 
dition on b n would be changed, to b n ^n}~ 01 (log/i)^^ — y as ti — y oo where 
a'< (<5 + 2 + e)/(24 + 12(o~ + 2 + e)). However, if6 n = [ra v J, thence (11/12,1) 
whereas Theorem 4 requires z^G (l/2 + a,l). 

Combining (8) and (10) yields 

MSE,^ BM )^ + ^ + o(i) + o(^), 
which will be minimized asymptotically by selecting a batch size of 

3.3. OBM versus BM. Comparing the conditions of Theorem 2 with 
the conditions of Proposition 3 in Jones et al. [24] which addresses the 
strong consistency of the BM estimator, we see that conditions (d) and (e) of 
Theorem 2 are not required for BM. For mean-square consistency with OBM, 
Theorems 3 and 4 require a moment condition on C which is not necessary 
in Chien, Goldsman and Melamed [6] and Song and Schmeiser [48]; however, 
the moment conditions on g and mixing conditions on the Markov chain are 
much weaker in our results. Moreover, from an implementation point of view 
it is clear that OBM will require more computational resources than BM. 

Why might we use OBM? Often <5"q BM has a lower asymptotic variance 
compared to <5"| M . Specifically, note that (10) yields 

Var (^OBM) ) 2 
Var(^ M ) 3 

as n — > oo (see Meketon and Schmeiser [34] for the same result under different 
assumptions). Also, in an effort to reduce the computational demands, Welch 
[52] argues that most of this benefit can be achieved by a modest amount of 
overlapping. For example, using a batch of size 64 and splitting the batch 
into 4 sub-batches, then we only need the overlapping batches (of length 64) 
starting at X ± , X 17 , X 33 , X 49 , X 65 
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4. Examples. In this section, we compare BM, OBM, RS and SV via 
their finite sample properties in two examples; one of which is a simple 
AR(1) model while the other is a more realistic Bayesian probit regression 
model. 

4.1. AR(1) model. Consider the following AR(1) model: 



where the ej are i.i.d. N(0, 1). As long as \p\ < 1 this Markov chain is geomet- 
rically ergodic with invariant distribution N(0, 1/(1 — p 2 )). Also, cov 7r (Xi, X{) = 



Consider estimating E n X with x n . Clearly, a CLT holds and, indeed, 
a 2 = 1/(1 — p) 2 . Thus, if <7„ is a strongly consistent estimate of a 2 , an asymp- 
totically valid confidence interval is given by 



where t* is an appropriate Student's t quantile. Our goal is to evaluate 
the finite sample properties of this interval when a n is produced by BM, 
OBM and SV methods with b n = \ n u \ for some u and p S {0.5,0.95}. For 
SV estimators, we will consider the Tukey-Hanning window (TH) and the 
modified Bartlett window (Brt). For BM the degrees of freedom for £» are 
a n — 1 and for OBM, TH and Brt the degrees of freedom for are n — b n . 

We compare the effects of using different batch sizes and variance estima- 
tion techniques on the coverage probabilities of a nominal 95% interval as at 
(11) for the two settings for p. This comparison is based upon the results of 
2000 independent replications of the following procedure. In each replication 
we simulated the AR(1) chain for le5 iterations. We then constructed the 
interval estimate using BM, OBM, Brt and TH with three different sampling 
plans, b n = \ n u \ where v E {1/3, 1/2,2/3} at five different points (le3, 5e3, 
le4, 5e4 and le5 iterations). 

The results with p = 0.5 are summarized in Table 1. In the calculations 
with v = 1/3 and u = 1/2, all of the calculated coverage probabilities are 
within 2 standard errors of the nominal 0.95 level when at least 5e3 iterations 
are used. We can also see that for all the settings, the coverage probabilities 
improve as the number of iterations increase. The choice of u = 2/3 seems 
to slightly underestimate the coverage probabilities for small numbers of 
iterations. Basically, when p = 0.5 the estimation problem is relatively easy, 
and all the methods and settings seem to perform well. 

Consider the case with p = 0.95. Table 2 shows the observed coverage 
probabilities. We can see the coverage probabilities get closer to the nominal 
0.95 level as the number of iterations increases. Also, as v increases, the 
confidence intervals become more accurate because the strong correlation in 



X-i — pXi-i + e.j 



for i = 1, 2, . . . 



p-7(i-p 2 ). 



(ii) 
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Table 1 

Table of coverage probabilities for 2000 replications using the example with 

p = 0.5. All calculations were based on the nominal level of 0.95. The standard errors for 
these numbers are easily calculated as \/p(1 — p) /2000 which results in a largest standard 

error of 6.4e-3 



Method 






Number of iterations 




le3 


5e3 


le4 


5e4 


le5 


BM 


K /3 J 


0.9315 


0.939 


0.937 


0.942 


0.943 


Brt 




0.93 


0.9395 


0.936 


0.9415 


0.944 


OBM 




0.9305 


0.9395 


0.936 


0.942 


0.944 


TH 




0.936 


0.9465 


0.9395 


0.9465 


0.947 


BM 




0.9415 


0.948 


0.939 


0.947 


0.949 


Brt 




0.933 


0.946 


0.935 


0.947 


0.9475 


OBM 




0.9385 


0.947 


0.9355 


0.9475 


0.9475 


TH 




0.9365 


0.9465 


0.9365 


0.948 


0.948 


BM 


Ln 2/3 J 


0.9475 


0.9445 


0.9385 


0.95 


0.9465 


Brt 




0.9105 


0.9265 


0.9275 


0.9445 


0.9425 


OBM 




0.9245 


0.935 


0.932 


0.947 


0.944 


TH 




0.9115 


0.927 


0.927 


0.9435 


0.9425 



the observations is better captured with the larger batch size or truncation 
point. In this case, the choice of v = 1/3 performs much worse than the other 
options analyzed. 

Comparing the results in Tables 1 and 2 we see that for highly correlated 
chains, a larger simulation effort and b n are required to achieve good cov- 
erage. However, with a lower correlation, larger values for b n can result in 
worse coverage probabilities, especially for smaller simulation efforts. 

4.2. Bayesian probit regression. Consider the Lupus Data from van Dyk 
and Meng [51]. This example is concerned with the occurrence of latent mem- 
branous lupus nephritis using yj, an indicator of the disease (1 for present), 
Xn, the difference between IgG3 and IgG4 (immunoglobulin G) and Xi2, IgA 
(immunoglobulin A) where i = 1, . . . , 55. Suppose 

Pr(y, = 1) = $(/3 + Pixa + f3 2 x l2 ) 

and assign a flat prior (three-dimensional Lebesgue measure) on (3 := (/?o, fo)- 
Roy and Hobert [47] verify that the resulting posterior distribution is proper. 
Our goal is to estimate the posterior expectation of f3, E n f3. 

We will sample from w{(3\y) using the PX-DA algorithm of Liu and Wu 
[31]. Let X be the 55 x 3 design matrix whose ith row is xj = (l,Xu,Xi2) 
and let TN(/i, o~ 2 ,w) denote a normal distribution with mean \x and variance 
<7 2 that is truncated to be positive if w = 1 and negative if w = 0. Then one 
iteration /3 — > f3' requires: 
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Table 2 

Table of coverage probabilities for 2000 replications using the AR(1) example with 
p = 0.95. All calculations were based on the nominal level of 0.95. The standard errors 
for these numbers are easily calculated as \/p(l — p) /2000 which results in a largest 

standard error of 0. 01 1 



Method 


b n = 






Number of iterations 




le3 


5e3 


le4 


5e4 


le5 


BM 




0.614 


0.738 


0.766 


0.842 


0.872 


Brt 




0.606 


0.736 


0.764 


0.841 


0.871 


OBM 




0.61 


0.736 


0.764 


0.842 


0.872 


TH 




0.61 


0.74 


0.77 


0.854 


0.886 


BM 




0.838 


0.903 


0.9155 


0.94 


0.9425 


Brt 




0.807 


0.893 


0.911 


0.9365 


0.9385 


OBM 




0.821 


0.895 


0.913 


0.937 


0.9395 


TH 




0.822 


0.9055 


0.9235 


0.943 


0.945 


BM 


Ln 2/3 J 


0.927 


0.9385 


0.933 


0.948 


0.9465 


Brt 




0.872 


0.916 


0.9185 


0.944 


0.942 


OBM 




0.89 


0.925 


0.924 


0.9455 


0.943 


TH 




0.885 


0.92 


0.924 


0.9435 


0.9425 



1. Draw zi, . . . , Z55 independently with Z{ ~ TN(xff3, l,yi). 

2. Draw g 2 ~ Gamma(f , \ E^i^i ~ X I (X T X)' 1 X T z] 2 ) and set z' = (g Zl , 
■■-,gztt) T - 

3. Draw f3' ~ N{{X T X)~ l X T z\ {X T X)~ l ). 

Roy and Hobert [47] prove that this sampler is geometrically ergodic. 

Just as with the AR(1) example, we want to compare observed coverage 
probabilities based on different methods of estimating the variance of the 
asymptotic normal distribution. Of course, we need the true value of the 
unknown posterior expectation E n /3 to do this. To solve this problem, we 
calculated an estimate of E n /3 from a long simulation (le8 iterations) of the 
PX-DA chain and declared the observed sample averages to be the truth. 
Table 3 shows the observed sample averages along with MCSEs which were 
calculated using BM with a batch size of b n = re 1 / 2 . 

We now turn our attention to comparing coverage probabilities in the 
context of fixed- width methodology, described in Section 1, for each com- 
bination of BM, OBM, Brt and TH with sampling plans, b n = \ji u \ where 
v G {1/3,1/2}. Suppose we want to estimate each component of E w f3 to 
within e £ {0.1,0.2,0.3} while requiring a minimum simulation effort of 
n* € {5e4, le4, 5e3}, respectively. (Recall that the asymptotics require the 
desired half-width e — > to achieve an asymptotically valid interval.) That 
is, using the PX-DA algorithm started from the maximum likelihood esti- 



BATCH MEANS AND SPECTRAL ESTIMATORS 



15 



Table 3 

Values treated as the "truth" for estimating 
confidence interval coverage probabilities based on 
le8 iterations 





/9o 


/3i 


02 


h 


-3.0166 


6.9107 


3.9792 




11.85 


22.60 


14.74 


MCSE 


1.18e-3 


2.26e-3 


1.47e-3 



mate /3 = (—1.778,4.374,2.482), we terminate the simulation the first time 
the following inequality holds: 

(12) maxj^^|, for j = 0,1,2 j + el{n < n*) + n" 1 < e, 

where t* is the appropriate critical value for a nominal 95% interval and op i 
is an estimate for ap j obtained under the settings described above. If (12) 
was not satisfied, then an additional 10% of the current number of iterations 
were simulated before checking the criterion again. To estimate the coverage 
probabilities we obtained 1000 independent replications of this procedure. 
Table 4 shows the estimated coverage probabilities and the mean number of 
iterations at termination. 

When u = 1/3, the results are terrible with any method for estimating 
g\ or value of e. We can see that even when the simulations are run for 
more than le5 iterations (when e = 0.1), the resulting coverage probabilities 
are poor. When v = 1/2, all of the methods result in coverage probabilities 
slightly lower than the nominal 0.95 level. It also appears that using TH 
results in slightly better coverage probabilities while requiring slightly more 
simulation effort. Somewhat surprisingly, the observed coverage probabilities 
did not uniformly improve as e decreased. 

To this point we have examined the performance of multiple confidence 
intervals individually but there is an obvious inherent multiplicity issue. 
Thus, we use a Bonferroni correction to calculate simultaneous confidence 
intervals. We maintain all settings described above with e = 0.2, except that 
instead of using nominal 95% we use nominal 98 1/3% confidence inter- 
vals. Table 5 shows the estimated coverage probabilities for E n /3 and mean 
number of iterations at termination based on 1000 independent replications. 
With v = 1/3, the results have improved but are still quite poor. However, 
in the v = 1/2 case, all individual confidence intervals perform well with ob- 
served coverage probabilities close to the nominal 0.9833 level. In addition, 
the simultaneous intervals have observed coverage probabilities greater than 
the 0.95 nominal level. 



16 



J. M. FLEGAL AND G. L. JONES 



Table 4 

Summary of results for using fixed-width methods for the Lupus data Bayesian probit 
regression. Coverage probabilities using calculated half-width have MCSEs between l.le-2 

and 1.5e-2 when b n — [n 1 ^] and between 7.3e-3 and 8.7e-3 when b n = L?i 1/ ' 2 J • The 
table also shows the mean (s.e.) simulation effort at termination in terms of number of 

iterations 



Method 


e 






= L« 1/3 


J 






= Ln 1 / 2 


J 


00 


01 


02 


n 


0o 


01 


02 


n 


BM 


0.3 


0.699 


0.709 


0.704 


6.43e3 (33) 


0.921 


0.925 


0.927 


1.85e4 (89) 


Brt 




0.699 


0.708 


0.705 


6.34e3 (32) 


0.926 


0.923 


0.930 


1.81e4 (81) 


OBM 




0.700 


0.710 


0.706 


6.37e3 (32) 


0.926 


0.922 


0.932 


1.83e4 (81) 


TH 




0.699 


0.710 


0.703 


6.53e3 (34) 


0.936 


0.938 


0.941 


1.95e4 (85) 


BM 


0.2 


0.794 


0.781 


0.782 


1.91e4 (60) 


0.922 


0.929 


0.930 


4.54e4 (154) 


Brt 




0.797 


0.775 


0.781 


1.89e4 (60) 


0.927 


0.928 


0.934 


4.48e4 (141) 


OBM 




0.796 


0.777 


0.781 


1.90e4 (60) 


0.928 


0.928 


0.936 


4.50e4 (140) 


TH 




0.804 


0.790 


0.794 


1.97e4 (61) 


0.940 


0.944 


0.944 


4.79e4 (144) 


BM 


0.1 


0.854 


0.849 


0.853 


l.lle5 (187) 


0.923 


0.920 


0.917 


1.94e5 (454) 


Brt 




0.851 


0.850 


0.850 


l.lle5 (185) 


0.925 


0.922 


0.917 


1.93e5 (410) 


OBM 




0.853 


0.850 


0.850 


l.lle5 (185) 


0.925 


0.922 


0.917 


1.93e5 (412) 


TH 




0.860 


0.858 


0.850 


1.18e5 (195) 


0.932 


0.929 


0.932 


2.02e5 (419) 



4.2.1. Comparison to regeneration. Finally, we compare BM, Brt, OBM 
and TH with regenerative simulation (RS) in terms of coverage probabili- 
ties. RS is often viewed as the gold standard; see the references in Section 
1. We are still using the model given above and sampling via the PX-DA 

Table 5 

Summary of results for using fixed-width methods with a Bonferroni correction for the 
Lupus data Bayesian probit regression. Coverage probabilities using calculated half-width 
have MCSEs of between 9e-3 and le-2 when b n = [n 1 ^ 3 J and between 4-9e-3 and 6e-3 
when b n = [n 1//2 J. The table also shows the mean (s.e.) simulation effort at termination 

in terms of number of iterations 



Method 


b n = 


0o 


0i 


02 


Simultaneous 


n 


BM 




0.911 


0.904 


0.906 


0.878 


3.21e4 (85) 


Brt 




0.909 


0.904 


0.909 


0.879 


3.19e4 (85) 


OBM 




0.910 


0.905 


0.909 


0.879 


3.20e4 (85) 


TH 




0.917 


0.909 


0.911 


0.884 


3.34e4 (87) 


BM 


K /2 J 


0.973 


0.975 


0.972 


0.965 


6.97e4 (210) 


Brt 




0.973 


0.971 


0.974 


0.965 


6.86e4 (189) 


OBM 




0.973 


0.970 


0.972 


0.963 


6.90e4 (191) 


TH 




0.973 


0.976 


0.972 


0.969 


7.30e4 (195) 
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Table 6 

Coverage probabilities comparing BM, OBM, and 
SV using 7e5 iterations and b„ = [n 1 '' 2 J to RS. 
MCSEs vary between 6.9e-3 and 7.6e-3 



Method 


/So 


0i 


02 


BM 


0.945 


0.945 


0.950 


Brt 


0.941 


0.942 


0.948 


OBM 


0.942 


0.942 


0.948 


TH 


0.946 


0.945 


0.950 


RS 


0.938 


0.938 


0.938 



algorithm, but we are not concerned with fixed-width methodology in this 
section. Roy and Hobert [47] implement RS for this example and we use 
their settings exactly except that we use only 50 regenerations. We obtained 
1000 independent replications resulting in a mean simulation effort of 7.12e5 
(3.2e3). For a fair comparison in terms of simulation effort, confidence in- 
tervals for E n (3 were calculated using BM, OBM and SV using b n — \ji 
from 1000 independent replications with 7e5 iterations each. Table 6 shows 
the resulting coverage probabilities for The coverage probabilities from 
BM, Brt, OBM and TH are all slightly larger than those of RS; however, the 
results from all the methods are within two standard errors of the nominal 
0.95 level. 

4.3. Summary. In our examples, we consider different estimators of cx^. 
Roughly all of the methods considered resulted in similar performance in 
terms of estimated coverage probabilities. Recall, OBM and Brt are asymp- 
totically equivalent and the simulation results show there is little difference 
between the two in finite samples. However, the TH estimator tends to per- 
form slightly better than OBM and Brt. In our experience, Brt (and SV 
methods in general) tended to be slightly faster than OBM from a compu- 
tational perspective. As suggested by the theoretical results in Section 3.3, 
the estimator from BM was observed to be more variable than OBM. This 
was consistent in both examples in multiple realizations of the simulation. 

Using the Bayesian probit regression model we compared our methods 
to RS. The resulting simulation showed all of the methods performed very 
well. The advantage of RS is that in the fixed-width setting the actual chain 
does not need to be stored as the simulation progresses. However, RS re- 
quires a theoretical cost that, while not overly burdensome in our view, may 
dissuade the typical practitioner. The resulting simulation is also dependent 
on the length of the regeneration tours which can be extremely long in even 
moderately large finite state spaces or as the dimension of the Markov chain 
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increases (Gilks, Roberts and Sahu [14]) or in variable-at-a-time Metropolis- 
Hastings implementations (Neath and Jones [39]). In contrast, BM, OBM 
and SV are relatively simple to implement though they can require saving 
the entire chain if fixed-width methods are employed. Given the current 
price of computer memory, this is clearly not the obstacle it was in the past. 

Another simulation goal was to investigate the finite sample behavior 
of different batch size selection. Theoretically, we showed that the batch 
size should increase at a rate proportional to ra 1 / 3 but the proportionality 
constant is unknown. In our examples, using b n = [n 1 / 3 ] seemed to give 
very poor results because the batch size or truncation point was too small. 
In realistic examples with higher correlations, the larger batch size b n = 
[ra 1 / 2 ] worked well agreeing with the previous work of Jones et al. [24]. Our 
investigation of b n = |_n 2 / 3 J worked well in high correlation settings, though 
for long chains more computational effort was required. 

On balance, we would recommend using the SV method with the Tukey- 
Hanning window with b n = [n 1 / 2 ] as a default method. If the required mo- 



ment conditions for the SV methods were too much, then we would employ 
OBM. 



Let B = {B(t) , t > 0} denote a standard Brownian motion. Define Bj (k) := 
k^ 1 (B(j + k) — B(j)) and B n := n~ l B(n). We will require the following two 
results from Csorgo and Revesz [8] on the increments of Brownian motion. 

Lemma 1. For all e > and for almost all sample paths there exists 
rao(e) such that for all n>no, 



Lemma 2. Suppose Assumption 2 holds, then for all e > and for almost 
all sample paths, there exists no(e) such that for all u>uq, 



A strong invariance principle holds if there exists a nonnegative increas- 
ing function ip(n) on the positive integers, a constant < a g < oo and a 
sufficiently rich probability space £1 such that 



APPENDIX A: BROWNIAN MOTION 



B(n)\ < (1 + e) [2n log log n] 



1/2 




n 





i=l 
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where the w.p. 1 in (13) means for almost all sample paths. Alternatively, 
(13) can be expressed as there exists no and a finite random variable C such 
that for almost all uj G f2, 



(14) 



^Yi- nE v g - a g B(n) 



< C(u)i/)(n) 



for all n > riQ. A strong invariance principle is enough to guarantee both 
a strong law (1), a central limit theorem (2) and a functional central limit 
theorem among other properties (see Philipp and Stout [40] and Damerdji 
[9]). We will rely on the following result to connect the strong invariance 
principle to the convergence rate of a Harris ergodic Markov chain (see 
Bednorz and Latuszyhski [2] and Jones et al. [24] for a proof). 

Lemma 3. Let g : X i— ?■ IR be a Borel function and let X be a geometrically 
ergodic Markov chain with invariant distribution ir. Suppose Assumption 3 
holds and E 7T \g\ 2+s+e < oo for some 8 > and some e > 0, then a strong 
invariance principle holds with ip(n) = n a logn where a = 1/(2 + 5). 



APPENDIX B: STRONG CONSISTENCY PROOFS 

B.l. Proof of Theorem 1. The proof will be constructed in 3 stages 
given in lemmas below, but first we define some notation. Recall that X = 
{X\,X2, . . .} is a Harris ergodic Markov chain with invariant distribution tt 
and Yi = g(Xi) — E^g for i = 1, 2, 3, — Further define Yj(k) = Ym=i Yj+i 
for j = 0, . . . , n — b n and k = 1, . . . , b n and Y n = n~ l Yli=i Yi- Next let 

n-b n b n 

<« := - E ^2k 2 A 2 w n (k)[Y J (k)-Y n 



I 2 

1 raj 

n 

j=0 k=l 



and 



-. n— b n b n 
n 3=0 k=l 



Lemma 4. Suppose (13) holds with ijj(n) = n Q logn where a = 1/(2 + 8) 
and Assumptions 1 and 2 hold. If, as n— ?■ oo, 

bn 

(15) 6y 2 n a (logn) 3 / 2 ^|A 2 u; n (A;)| ^0 and 

k=l 

bn 

(16) n 2a (logn) 2 ]T|A 2 u, n (£0|^0, 

k=l 
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then <r^ n — cr g a 2 — > w.p. 1. 



Proof. Notice 



n—bn 6„ 



j=0 k=l 

Let A fe = *;[?•(*;) - a g Bj(k)], D k = B{j + k) - B(j), E n , k = kB n and F n , k 
k[Y n -a g B n }. Then 

k[Yj(k) - Y n ] = k\?j(k) -Y n ± a g Bj{k) ± a g B n ) 

= k\Yj{k) - (JgBjik)] + agkBjik) 

- o g kB n -k[Y n - a g B n ] 

= k[Y 3 (k) - agBjik)} + a g [B(j + k) — B(j)} 

- OgkBn - k[Y n - cr g B n ] 
= A k + o g (Dk — E H) k) — -Fn,fc- 



Hence 



In- 2 —n 2 a 2 \ 



n—bn b n 



- ~ J2 ^2\^2W n (k)[(A k + a g (D k - E ntk ) - F n>l 



(17) 



j=0 k=l 



n—bn b n 



a 2 JD k -E ntk ) 2 } 



<-^2^2\^2W n (k)\(Al + Fl k + 2a g \A k D k \ + 2ag\A k E n<k 



j=0 k=l 



+ 2\A k F n!k \ + 2o g \D k F ntk \ + 2og\E n ^ k F n ^ k 



It suffices to show that each of the 7 sums in (17) tend to as n — > oo. 
our assumption of (13) we have that for sufficiently large n, 



(18) 



Y^Yi-agBin) 



i=l 



< Cn a logn. 



1. From (18), we obtain 
\A k \ 



/3+k \ / j N 

lj2Y t -a g B(j + k)\ -[Y,Yi-a g B(j) 



< C(j + k) a log(j + k) + C(j) a log(j), 
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and since j + k <n, 

(19) \A k \ <2Cn a logn. 
Hence 

.. n—b n b„ b„ 

"EE \^2W n {k)\A 2 k < 4C 2 n 2a (logn) 2 ]T \A 2 w n (k)\ -> 

j=0 k=l k=l 

as n — > oo by (16). 
2. From (18) and the fact that k < b n < n 

(20) \F n>k \ < Cfcn a_1 log n, 
resulting in 

^ n— 6 n 6„ fe n 

-^^lA^^lF^^CV^aogn^^^lA^^fc)! 

j=0 fc=l fc=l 

6n 



<C 2 b 2 n n 2a - 2 (logn) 2 Y, \&2W n (k)\ 



k=l 

as ti — y oo by (16). 
3. From Lemma 2, 

|A=I = \B(j + k)-B(j) | 

< sup sup \B(t + s) -B(t)\ 

0<t<n-b n 0<s<b n 

(21) 

( ( n W Xt2 

< (1 + e) ( 26 re Mog— + log log n 

<2(l + e)6y 2 (logn) 1 /2. 

Combining (19) with (21), we obtain 

n—bn b„ 



- J2\^2W n (k)\2a B \A k D k \ 



n 

j=0 k=l 



< 8Ca g (l + e)b l J 2 n a (\ognf/ 2 £ | A 2 w n (k)\ -> 



fc=i 



as n — )• oo by (15). 
4. From Lemma 1, 

(22) \E n u\ < V2(l + e)kn- l l 2 (\og\ogn) 1 ' 2 . 
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Combining (19) with (22), get 

- n-b n b n 

- ^2\^2W n (k)\2a g \A k E n>k \ 
71 j=Q k=l 

< 2 5 / 2 Ca s (l + e)n Q " 1 / 2 logn(loglogn) 1 / 2 ^ k\ A 2 w n (k)\ 

k=l 

< 2 5 / 2 Ca g (l + e)(6 n /n) 1 /2 6 i/2 n « (logn) 3/2 ^ |A 2 «, n (A;)| ->• 

fc=i 

as n — > oo by (15) and Assumption 2. 

5. Prom (19) and (20) we have 

n-bn b n b n 

~ E X^|A 2 i«n(A;)|2|A Jfe F ri)fc | <4C 2 n 2a - 1 (logn) 2 ^A ; |A 2U ; ri (fe)|^0 

j=0 k=l k=l 

as n — > oo by (16) and Assumption 2. 

6. From (20) and (21), 

n-bn bn 

- Yl ^2\^2W n (k)\2a g \D k F nik \ 

j=0 k=l 

bn 

k=l 

as n — y oo by (15) and Assumption 2. 

7. From (20) and (22), 

- E ^|A 2 m„(/c)|2o- 9 |£' nifc F nifc | 

j=0 k=l 

bn 

< 2 3 / 2 Ca g (l + e)n a " 3 / 2 log n(log log n) 1 / 2 ^ k 2 \A 2 w n (k) | -> 

fc=l 

as n — )• oo by (15) and Assumption 2. □ 

Lemma 5. Let X be a geometrically ergodic Markov chain with invariant 
distribution ir and g.X — > WL be a Borel function with E 1T \g\ A+&+t < oo for 
some 5 > and e > 0. Set h{Xi) = [g(Xi) — E^g] 2 for i > 1. If Assumptions 
2 and 3 hold and b~ l n 2a \ogn — > as n — > oo where a = 1/(4 + S), then 
6" 1 X^i=i h{Xi) and b~ l Y^i= n -b +1 ^(-^i) s ^ a 2/ bounded as n — »■ oo 1. 
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Proof. The ergodic theorem implies that, w.p. 1, b n 1 Y2i=i h{Xi) con- 
verges to a finite limit and hence stays bounded as n — > oo w.p. 1. 

Note that if E n \g\ 4+s+e < oo, then E n \h\ 2+5 / 2+e / 2 < oo, and hence our 
assumptions with Lemma 3 yield the existence of no such that if n > no, 



(23) 



Y,h{X i )-nE 1 ,h{X l )-a h B{n) 



i=l 



< Cn 2a logn, 



where a = 1/(4 + 5). 

Next, for all e > and sufficiently large n(e), 



i=n—b n +l 



1 



n n—b n 
i=l i=l 



£ h(Xi) - nE^h(X x ) - a h B(n)j 

l—bn 

HXi) - (n - b^EvhiXx) - a h B{n - b v 



/ n—b 



i=l 



< 



+ a h {B{n) - B{n - b n )) + 

n 

Y J KX i )-nE^h{X 1 )-a h B{r 

i=l 
n—b n 

£ KXi) - (n - b^E^Xx) - a h B(n - b n ) 



+ 



i=i 



+ a h \B{n) - B{n -b n )\+ b n E n h(X!) 



< 



2Cn 2a logn+ (1 + e) ( 26 n ( log f- + log log n 



n 



1/2 



= E n h(Xi) + 2C6~ 1 n 2a logn + O^b" 1 logn) 1 / 2 ), 

where the second inequality follows from (23) and Lemma 2. Hence, 6" 1 x 
+1 ^C^i)l stays bounded w.p. 1 since ft^n^logra — >• as n— >oo. 



□ 
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Lemma 6. Suppose Assumptions 1 and 2 hold. Further assume the con- 
ditions of Lemma 5 and that: 

1. there exists a constant c> 1 such that ^2 n (b n /n) c < oo; 



b n n 1 k\ Aiw n (k)\ — > as n—too and 



fe=l 



3. b n n 1 log n — > as n — > oo . 

TTien i/iere exists a sequence of random variables d n such that 



and d n — > w.p. 1 as n — )■ oo. j4feo ; a* —> 1 as n — >■ oo. 



2 



Proof. This follows immediately from the conclusion of Lemma 5 and 
results in Damerdji [9], page 1430. □ 

Proof of Theorem 1. The result follows by combining Lemmas 3, 4, 
and 6. Note that since X is geometrically ergodic and E 7T \g\ i+s+e < oo for 
some 5 > and e > 0, the conclusion of Lemma 4 holds with a = 1/(4 + 5). 
□ 

B.2. A condition for lag windows. 

Lemma 7. Suppose w is defined on [0, 1] such that w(0) = 1 and w(l) = 
0. Further assume that w is twice continuously differentiate. Also, assume 
that D\ and D 2 are finite constants such that \w'(x)\ < D\ and \w"(x)\ < 
L>2- Then as n— >oo, condition (a) of Theorem 1 holds if b^n^ 1 — > while 
condition (d) of Theorem 1 is satisfied if 6~ 1 n 2a (logn) 3 — >0. 



Proof. Suppose 1 < k < b n — 1 and let 
'k — 1 



2io — +w 



fc-1 



W 



k + l 

b n 



The mean value theorem guarantees the existence of c\ € (%^, jp-) and C2 € 



w'(a) w'(c 2 ) 



b„ 
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A second application of the mean value theorem yields a constant c 6 (ci, C2) 
such that 



A 2 w — 



k \ w"(c) 



Since c 2 — ci < 2/b n and |u/'(x)| < -D2 we have 



(C2 - Ci). 



A 2 U> 



A: 



< 



2^0 



62 ' 



Then 



E 

fc=i 



A 2 U) 



b n -l 

E 

fc=i 

6„-l 



A 2 u> 



+ 



w 



b n -l 



W(l) 



62 



n u n 



2(6 n -l)£> 2 Dx 
bl + 6n 



Next observe that 
/ b n 

b n n 2a (logn) 3 [Y^ 



A 2 w 



< 6 n n 2Q (logn) ; 



2(6 ra -l)D 2 | gi 



b- 1 n 2a (logn) i 



2(6„ - 1)^2 



+ £>! 



— > as n — > 00, 



since 6~ 1 n 2a (logn) 3 as n — > 00 by assumption. The proofs that our condi- 
tions also imply the remaining portion of condition (d) and condition (a) 
are similar and hence are omitted. □ 



B.3. Proof of Theorem 2. Consider the modified Bartlett lag window de- 
fined in Remark 3. Then A 2 tt; ri (6„) = 6" 1 and A 2 u; n (A;) = for k = 1, 2, . . . , b n - 
1 so that 

n—b n b„ , n-b n 

(24) ^ >B = n- 1 £ ^ k 2 A 2 w n (k)[Yj(k) - Y n ] 2 = ^-^2 [ Y j(K) ~ %¥ 
and 



n 

j=Q k=l j=0 



, n-b n 

(25) a 2 = ^^[^(6 n )-S n ] 2 . 

n i=o 
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Further, (15) and (16) are satisfied if 6~ 1 n 2a (logn) 3 — > as n — > oo where 
a = 1/(2 + <5), and hence our assumptions imply the conclusion of Lemma 
4. Since 



A-2 

"OBM 



nb ri 



n—bn 



(n - b n )(n 



tTT)ER«- ? "f 



the conclusion would follow from Lemmas 3 and 4 if <t 2 — > 1 w.p. 1 as n — > oo 
because cr 2 ^ is asymptotically equivalent to Oqbm- as n — > oo 



. 2 
°"OBM 



1 - — 

n 



- + f > 

re Or, 



set 



Define C/j := -B(i) — B(i — 1), the increments of Brownian motion, so that 
, U n are i.i.d. N(0, 1). Further define Ti = Ui — B n for i = 1, . . . ,n and 



d n = ~ 

n 



'i-i 



E£(E^+ E 

&n-l 



1=1 



i=n— bn+l+1 
l-l 



E T * T ^ + E TiT s+i 
,i=i n \i=i j =n _{, n+ / + i , 

where any empty sums are defined to be zero. Letting ■%,(*) = 7n(— i) := 
n^Y^tZliUt — B n )(Ut+i — B n ) Damerdji [9] shows that under Assumption 



6-* + dn 



E 

-(&„-!) 



7n(s). 



Moreover, under conditions (a) and (b) of Theorem 2, cr 2 + d n — > 1 as n — > oo 
w.p. 1. Thus the proof will be complete if we can show that d n — > as n — > oo 
w.p. 1. 

Lemma 8. Suppose Assumption 2 holds. If: 

1. i/iere exzsis a constant c > 1 smc/i i/iai E^ n (fe n /re) 2c < oo; 

2. there exists an integer uq and a constant c\ such that for all u>uq we 
have \ogn/b n < c\ and 

3. as re oo we have b\n~^ log log re — > and & 2 n~ 2 log log n — s- 0, 

i/ien d n — > w.p. i as n — >■ oo. 

Before we present the proof we recall two results which will be used below. 
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Lemma 9 (Kendall and Stuart [27]). If Z ~ Xv> then for all positive 
integers r there exists a constant K := K{r) such that E[(Z — v) 2r ] < Kv r . 

Lemma 10 (Whittle [53]). Let U\, . . . ,U n be i.i.d. standard normal vari- 
ables and A = Ylk a jkUjUk where ajk are real coefficients; then for c > 1 
we have 



E[\A-EA\ 2c ]<K(c)(j2zZ a %) 



j k 

for some constant K c < oo depending only on c. 



Proof of Lemma 8. Note 



(26) \d n \< 



1 



'i-i 



E^(E^ 2 + E n 

i=n—b n +l+l / 



(27) 



1 



fen — 1 fen" 



+ - 2 E E h n E T ^ + E T ^ +i 

s=l 1=1 \i=l i=n—b n +l+l 

We will show that (26) and (27) each converge almost surely to zero implying 
the desired result. 
First consider (26): 



bn -, (I— 1 

Er E?? + E 



i=i 



< 



,i=l 



i=n—b„+l+l 



'b n -l 



1 / 

St E *f 

i=l \ j=l i=n— bn+2 

b„— l n 

E^+ E n 



i=i 



b n -l 



i=n—b„+2 



i=n—b n +2 



i=l 



< 



n 



b n —l n 

E^+ E % 

i=l i=n—b n +2 



+ 



1 



V 



/b n -l n 

2B n [J2 U i+ E u * 

i=n—b n +2 



i=l 



+ -\2{b n -l)B 2 n \. 
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Now we show that each of the three terms above tend to zero. 

1. Since Ul...,Ul are i.i.d. X \, E^T 1 U? + E^n^+2 u f ~ xl(& n -i)- By 
Lemma 9 we have for every positive integer r that 

2r\ 



Ei 



1 / Tl 



2(6„ - 1) 



< if 



i=l i=n— fe n +2 

2(b n -l)Y 



n 



n- 



Now choose r > 1 so that 

2(6„ - 1; 



< 



< oo. 



n v ' n 

Let e > be arbitrary. Then by Markov's inequality we have 



; £Pr< 



1) 



2r 



; E"?+ E of " — 

\ i=l i=n—b n +2 / 

Now a standard Borel-Cantelli argument (Billingsley [3], pages 59 and 
60) yields 

2r 



fbn-1 



E 



i=l 



i=n—b n +2 



2(&n ~ 1) 
71 







w.p. 1 as n — > oo. 



Hence 

E^+ E ^ 

i=l i=n— b n +2 

since & n /n — > as n — > oo. 
2. Notice 

j /fen — 1 ra 

E ^ 



w.p. 1 as n — > oo, 



8=1 



j=n— 6„+2 



i=l 



Since the \Ui\ are i.i.d. following a half- normal distribution, the classical 
strong law implies n~ l ^27=1 1^1 ~~ ^ W 71 " W, P- (R ecan that -E|£/j| = 
y2/7r and Var |J7j| = 1 — 2/7T.) Combining this with Lemma 1 yields for 
every e > and sufficiently large n, 



2 LB. 



n ^— ' 



i=l 



log log n 



n 



-X>i->o 



i=l 



with probability 1 as 77, — > oo. 
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3. Using Lemma 1 shows that for every e > and large enough n, 



n 



■\2{b n - < 4(1 + e 



2 (fr n - 1) log log U 



n- 



since b n /n — > as n — > oo. 



This establishes that the term in (26) converges almost surely to as n — > oo. 
Next, consider (27): 



b n — lb„—s ., /i— 1 



=i i=i 

b n — lb n —s ^ /l—l 



s+i ~i~ ^ ^ TjT s +i 
i=l i=n—b n +l+l J 



s =l ;=i ° n \i=i 



s+i B n 



+ Yl (Ui - B n )(U s+l - B n 

i=n—b n +l+l 



< 



n 



bn — lbn—s -J / l—l n—s N 

2 E E ^ E + E 



+ 



1 



s=l 1=1 



Br, 



'/-l 



+ E - ^ 

i=n— b n +l+l 



1 

+ - 

n 



bn — 16n — s f>2 



s=l Z=l n 

We show that each of the three terms above tend to zero. 
1. Letting 

_ b„ — lb„~s /l-l n—s 

= -r E E E + E U M 



s+i j 



=1 Z=l \j=l 



i=n— bn+l+1 



it is straightforward to see that -A(6 n ) can be written in the form ^ ■ ^ fc ajfc : 
J7jC/fc where the coefficients satisfy 



< a jk < 



2(b n ~ 1) 
nb n 
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Since U\, . . . , U n - bn+2 , ■ ■ ■ , U n are i.i.d. N(0, 1) we have EUiU s+i = 

0, and hence EA(b n ) = 0. Thus, we can apply Lemma 10 to obtain for 



c > 1 and a constant K c < oo , 



j k 

Then, similar to the argument given above, Markov's inequality and 
Borel-Cantelli yields |^4(6 n )| — > as n — > oo since by assumption there 
exists a constant c > 1 such that ^2 n {b n /n) 2c < oo. 
2. Notice 



2 

nb n 



bn — l b n —S 



'l-l 



EC-* 7 * - ^-h) + E - ^ 



s=l 1 = 1 



\i=l 

b„—l b n —s / l—l 



i=n—b n +l+l 



+ E (M + m 

i=n— b n -i-l+l 



s+i\ 



b n — 1 b n —s / b n 



^^-i^)iE E( 2 Ei^i+ 2 E ^ 



=i i=i \ i=i 



i=n—b„+l 



%^l*(»)l(5>l+ E Wl). 

\i=l j=n— 6„+l / 



Then for n large enough and e > 0, Lemma 1 implies the right-hand side 
is bounded above by 



2 3 / 2 (l + 



(b n - l) 2 log log n 



-i 1/2 / b n n 

[E\Ui\+ E p* 

\i=l i=n— b n +l 



Since b n n 6 log log n — > as n — > oo by assumption it suffices to show that 

6n 



r\TF'\+ E i« 



i=n— 



stays bounded with probability 1 as n — > oo. 

The classical SLLN implies fr^ 1 stays bounded w.p. 1. All 

that remains is to show that b~ z27=n-b +iPi\ stays bounded almost 
surely as n — > oo. Since the half- normal distribution enjoys a moment 
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generating function we can appeal to the classical strong invariance prin- 
ciple (Komlos, Major and Tusnady [28, 29], Major [32]) to obtain that 
for sufficiently large n with probability 1 there is a constant C such that 



(28) 

8=1 

Now observe that 



Y^Pil ~ n^/2j^ - (1 - 2/ir)B(n] 



<C"logn. 



n 

k E ^ 



i=n—b„+l 



n—b n 



i=l i=l 

(f>i|- n^/^ - (1 - 2/7r)B(n) ) 
a=l / 

(n—bn 

E |Ci|-(n-6 n )>/2Ar-(l-2/7r)B(Ti-6 fl ) 



i=i 



+ (1 - 2/7r)(S(n) - B(n - &„)) + o^^^ 



Hence, by (28) and Lemma 2, for sufficiently large n, 
1 n 

r E Ml 



i=n— 6n+l 



1 



< — ( 2C'logn+(l + e) 

On 



26 n { log — + log log n 

K 



1/2 



V / 2A + 2C&" 1 logn + ©((ft^ 1 logn) 1 / 2 ) 



Hence, 6 n 1 X)I=n-6 stays bounded w.p. 1 since, by assumption, 

6^ 1 logn is bounded for all sufficiently large n. 
3. Using Lemma 1 we have for every e > and sufficiently large n 



bra— 1 b n —S p2 

EEf^-i 



s=l J=l 



2(0n " l) 2 = 2 



7i 



< 4(1 + e 
^0 



2 (6 n - l) 2 [log logn] 



71- 



as n — > oo since we assumed b n n log log n — )• as n — )• oo . □ 
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APPENDIX C: MEAN-SQUARE CONSISTENCY PROOFS 

C.l. Preliminaries. Recall that <7 BM is defined at (6) while <5q BM is de- 
fined at (7). Next define the Brownian motion version of BM by 

<5bm = — 77 (^k - B n ) 2 , 
an 1 k=0 

where B k := b~ l {B({k + l)b n ) - B{kb n + 1)) for k = 0, . . . , a n - 1. Further 
define the Brownian motion version of OBM by 

nb n ~ b ri 

4BM = (n-b n) n {n-b n+ l) g ~ 



Lemma 11 (Damerdji [11]). Suppose Assumption 2 holds, then 

E [^OBu\ = ^[^Bm] = 1) 
TL 4 

— Var[cr& BM ] = - + o(l) 

and 

-^Var[<7 BM ] = 2 + o(l), 

On 

where the limits are taken as n — > oo . 

The next claim follows from a careful examination of the proof of Lemma 
B.4 in Jones et al. [24]. 

Lemma 12. Suppose (13) holds with ^(n) =n a logn where a = 1/(2 + 5) 
and Assumption 2 holds. Then for sufficiently large n, there exist functions 
gi : Z_|_ —> and g 2 : Z + — > R + such that 

I^bm - ^ImI < C 2 gi(n) + Cg 2 (n), 

where the random variable C is defined at (14)- Moreover, if, as n — > oo, 
6~ 1 n 2a [logn] 3 — > 0, then gi(n) —> and g 2 (n) — > 0. 

We also require an analogous result for OBM. 

Lemma 13. Suppose (13) holds with ^(n) = ra Q logn where a = 1/(2 + 5) 
and Assumption 2 holds. Then for sufficiently large n, there exist functions 
gi : Z + — > R + , g<i : r L + — > R + , and g^ : 7L+ — > R+ such that 

I^OBM - o-^obmI < C 2 gi{n) + Cg 2 (n) + (<5"obm + ^obm)^™), 

where the random variable C is defined at (14)- Moreover, if, as n — > oo, 
6~ 1 n 2a [logn] 3 — > 0, then g\{n) — > 0, g2{n) — > 0, and g%(n) — > 0. 
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Proof. Let w be the modified Bartlett lag window. Notice 



/DO\ 2-2 I i*2 _2~2| , |^2 -2 

l 2y J FOBM ~~ ^g^OBMl Sft, 



, 1^.2-2 



2-2 
O'g^OBMl 



where <t^ n and <r 2 are defined at (24) and (25), respectively. Our assump- 
tions imply (15) and (16). Careful examination of the proof of Lemma 4 
shows that for sufficiently large n, 

K,n ~ < C 2 9i(n) + Cg 2 (n). 

Let g%{ri) = 1 — (n — b n )(n — b n + 1) jn 2 and observe that 



1-2 -2 I 

\ a OBM ~ a w,n\ 



nb r . 



(n - b n )(n -b n + l) 
nb n g 3 (n) 



, v n—bn 
J=0 



K-(6 n )"?n] 2 



= 5 , 3W<70BM> 

and, similarly, |c 2 <7 2 — 0" 2 <5q BM | = o- 2 aQ BM gs(n) . Combining these results 
with (29) yields the claim. □ 



Lemma 14. Let a 2 be either the BM or OBM estimator of a 2 and sup- 
pose (13) holds with 7(71) = n a logn where a = 1/(2 + 5) and Assumption 2 
holds. Also, assume that EC 4 < 00 where C is defined at (14) and E n g 4 < 
00. // b~ l n 2a [logn] 3 ->• as n ->■ 00, i/ien -E^cr 2 - CgO" 2 |] ->■ and ^[(a 2 - 



a 2 g a 2 ) 2 ]^0. 



wMe if as 00, b^r^/^^ognf/ 2 -»• 0, tfien £[^(<7n 



Proof. We will prove only the first claim for BM as the other proofs 
are quite similar. The omitted proofs for OBM require the use of Lemma 13 
in place of Lemma 12 and Lemma 4 with the ergodic theorem in place of 
Lemma B.4 from Jones et al. [24]. 

From Lemma 12 there exists an integer iV"o and functions g\ and g 2 such 
that 



I - 2 2-2 

Fbm ~~ ^s^bmI 



I^bm - ^bmI j (° < n < N ) + |<r BM - a 2 g al M \I(N < n) 
<|^BM-^BM^(0<n<iVo) 

+ [C 2 g l (n) + Cg 2 (n))I(N Q <n) 
: = g n (X ,...,X n ,B(0),...,B(n)). 
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Now 

E\g n (X ,...,X n ,B(0),...,B(n))\ 

< E\&BM ~ <rpBu\ + [gi(n)E{C 2 ) + g 2 (n)E(C)], 
and, since Lemma 11 and our assumptions on the moments of g imply 

E \VBM ~ ^BMl < E °BM + VgE^BM = E ^BM + <*g < 00 ) 

it follows from our assumptions on the moments of C that E\g n \ < 00. Next 
observe that as n — > 00, we have g n — > w.p. 1 and Eg n — > by Lemma 12. 
From Lemma B.4 in Jones et al. [24] we have that \<J B y[ — ^^bmI ~~ ^ W -P- 1 
as n —> 00. An application of the generalized majorized convergence theorem 
(Zeidler [54], page 1015) implies that, as n — > 00, £^[|<Tg M — o^bmI] ~~ ^ 0- ^ 

C.2. Proof of Theorem 3. We will prove only the claim for BM as the 
proof for OBM is nearly identical. 

Recall the mean-square error (MSE) of the estimator <r BM of a 2 is given 

by 

MSE(a BM ) = Var(<7 BM ) + Bias 2 (,7 BM ). 

First, consider the bias term. Recall from Lemma 11 that Ea BM = 1 so that 

Bias(a BM ) = E(a BM ) - <>% < E [\°BM ~ oJ^bmII ~> as ra ^ oo, 

by Lemma 14. Hence Bias 2 (<7 BM ) = o(l) as n — > oo. 

Note that the claim will follow if we can show that Var(cr BM ) = o(l) 
as n — > oo . Thus we now consider the variance term but we begin with a 
preliminary result. Recall from Lemma 11 that 

r Var[<7 2 M ]=2 + (l). 

On 

Define 

4m) = Var(<r BM - ap BM ) 

(30) 

+ 2a l E l(^BM ~ E °Bm)(?BM ~ ^BM)]- 

Using the Cauchy-Schwarz inequality and the the fact that Var(X) < EX 2 
obtain 

\q\ = |Var(<7 BM - a 2 a BU ) + 2a 2 E[(a 2 3M - Ea BM )(a BM - o- 2 <r| M )]| 
< E[(a 2 BM - a 2 g a 2 BM ) 2 ] + 2a 2 g {E[{a 2 BM - a 2 a 2 BM ) 2 ] V a ?(a 2 BM )) 1/2 

= o(l)+2a 2 (- [o(l)(2 + o(l))] 1/2 asn^oo, 
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and hence ^(^bmi^bm) = s i nce b n /n — > as n — > oo. Now 
Var(a BM ) = £[((ct bm - crpl M ) 



(31) 



+ ff s^BM ~ ^BM) ~ (^BM - ^^BM))' 

^ 4 £[(4m-£4m) 2 ] 

+ £ [((^BM - ^Im) ~ ^(^BM ~ ^^Bm)) 2 ] 
+ 2a g E [(°BM ~ ^ImXO^BM - ^Bm) 



^(^BM - ^Bm))1 



at Var(<7 BM ) + r?(<7 BM , ^bm 



~2 



o 4 . 
2a — + o 

y n 



n 



+ o(l) 



as n — > oo. 



Therefore, we conclude that Var(d" BM ) = o(l) as n — ^ oo. 
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C.3. Proof of Theorem 4. We only prove the claim for BM as the proof 
for OBM is nearly identical. 

From (31) and Lemma 11 we obtain 



n 
b„ 



Var(£ BM ) = 2<rJ + o(l) + j%, 

On 



where rj is defined at (30). Note that the claim will follow if we show that 
IP 77 = o(l) as n — > 00. As in the proof of Theorem 3 we have 



n 



11 



tH = HVar(£ BM - ^/bm) + 2a,^[(cTBM - ^bm)(*bm - ^/bm 



< E 



n 



£,2 



J2-2 \2 



T~\ a BM ~ a g a BM) 
On 



+ 2a' g [ E 



/ - 2 2-2 \S 

^l^BM ~~ <7 g a BM) 
On 

,1/2 



n 



— Var(a BM ; 

On 



1/2 



= o(l) + 2a;(o(l)(2 + o(l))) i '- asn^oo, 
using the results of Lemmas 11 and 14. Hence ^-77 = o(l) as n — >■ 00. 
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